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ABSTRACT 

Searching for audio samples within a library can be a tedious 
and time-consuming task. In this paper, we report on the de- 
sign of a pilot automatic classification system that utilises tim- 
bral properties to automatically classify audio samples. At 
this stage of the study, we have decided to work only with 
orchestral audio samples. In addition, we conducted a per- 
ceptual experiment to evaluate the performance of the system 
across five timbral attributes: breathiness, brightness, dull- 
ness, roughness and warmth. Promising classification results 
indicate that this approach may be suitable for further work 
that could also benefit some music production tasks. 

1. INTRODUCTION 

Searching for audio samples within a large library can be a 
tedious and time-consuming task for composers and sound 
designers. To assist them in this process, they can utilise new 
developments in music information retrieval (MIR)-for a sur- 
vey of MIR systems the reader is referred to ||T). The sys- 
tem presented in this paper utilises timbre properties to auto- 
matically classify audio samples. Timbre represents a com- 
plex musical property that is often defined as all the sound 
properties except pitch, loudness and duration, which allow 
us to distinguish and recognise the sound of two different 
sounds ©■ Furthermore, several works have demonstrated 
the importance of acoustic features in the definition of this 
multidimensional attribute (3]|4). 

Our initial aim for developing this system was to over- 
come the time-consuming task of listening to a large sound 
file database when composing. This search and listening task 
may be part of the compositional process, but it can also be a 
situation that composers would prefer to circumvent or make 
more efficient. However, a timbre-based classification system 
could also have applications for music production. For exam- 
ple, it could aid sound engineers in evaluating their mixings. 
The perceptual evaluation of timbral qualities utilised in this 
system could also be implemented in intelligent systems for 
music production. 

In order to make this system user-friendly, we targeted 
verbal descriptors of timbral qualities by means of their un- 
derlying acoustic correlates, which can sometimes be com- 
plex and potentially overlapping. Terms like brightness or 
roughness are words from everyday language used to describe 


perceived musical timbres. These terms are more intuitive 
than their acoustic correlates (e.g. spectral centroid, critical 
bands). 

2. THE CLASSIFICATION SYSTEM 

The prototype system is implemented in the Mat lab envi- 
ronment with a simple user interface. It currently integrates 
five timbral attributes: breathiness, brightness, dullness, rough- 
ness and warmth. We calculate the timbral index as follow: 

Breathiness. Fundamental amplitude against noise content and 
the spectral slope 0. The bigger the ratio between fundamental 
amplitude and the noise content, the breathier the sound. 

Brightness. Spectral centroid §• The higher the spectral cen- 
troid, the brighter the sound. 

Dullness. Spectral centroid (7J. A low spectral centroid value 
indicates that the sound is dull. 

Roughness. Distance between adjacent partials in critical band- 
widths and also the energy above the 6th harmonic |3j|. 

Warmth. Spectral centroid and energy in its first three harmon- 
ics j2}. A low spectral centroid and a high energy in the first three 
harmonics indicate that the sound is warm. 

To use the system, the user must first define the directory 
containing the audio files. Next, the system analyses each 
file and uses the acoustic correlates mentioned previously to 
calculate its timbral index for each attribute. File names and 
timbral indexes are then stored in matrices (one matrix per 
attribute), which are sorted in ascending order at the end, to 
return the best results at the top. 

3. PILOT STUDY 

In order to evaluate the accuracy of our classification system, 
we decided to run an experiment with human participants to 
determine the correlation between the humans’ responses and 
the system’s ratings. This perceptual evaluation was also de- 
signed to test the efficiency of the acoustic analysis when 
working with polyphonic timbre (timbral mixture emerging 
from several instruments). 

3.1. Method 

Training files. As training sources for the implementation of 
the system, we used 90 sound files generated beforehand with 
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Orchid^] a computer-aided orchestration program devel- 
oped by IRCAY'f] These sound files consisted of short audio 
samples of orchestration (ss 2 to 4 seconds) and composed 
of several orchestral instruments, including violins, flutes and 
trumpets. 

Stimuli. For the listening test, we decided to use 3 sound 
files for each timbral attribute. These stimuli were taken from 
the 90 orchestral sound files mentioned previously. For each 
attribute, we have the selected the files rated as the best, medi- 
um and worst results according to the classification performed 
by the system. 

Procedure. We conducted the perceptual evaluation with 
20 individuals, in the same room and using the same playback 
equipment (circumaural headphones). Each participant had to 
listen to 15 sound samples in total, which consisted of 3 sound 
samples (« 2 to 4 seconds each) for each timbral attribute. 
For each attribute, participants were asked to rate the 3 au- 
dio samples using the Verbal Attribute Magnitude Estimation 
(VAME) rating method |8j, which quantifies the applicability 
of the descriptor (e.g., bright o not bright), on a five-point 
Likert scale. 


3.2. Results and Discussions 

Due to space limitations, graphic representations of this stu- 
dy’s results are available on the internet at the following ad- 
dress: http : / / goo . gl/YkdOA6 

For the timbral attribute breathiness, brightness and dull- 
ness, participants’ responses and the system’s rating were sim- 
ilar for each sample. It appears to be a correlation between 
the participants’ responses and the classification system. The 
mean and standard deviation for each sample for breathiness, 
brightness and dullness support this interpretation. 

For the attribute warmth, we can note that participants re- 
sponses were similar to the system’s rating for the sample 
identified as best result. However, there is a difference for the 
two other samples. This will require further investigation on 
our calculations for the warmth index and also on the sound 
samples selected in order to identify the reason participants 
rated these two samples similarly while there was a difference 
in the system’s rating. 

Finally, for the attribute roughness, participants rated the 
sample identified as medium result as the roughest sound while 
the system suggested a different sample. We can note that par- 
ticipants’ responses and the system’s rating were similar for 
the sample identified as the least rough. However, these re- 
sults do not validate our calculations for the attribute rough- 
ness and further investigations are required to improve our 
system’s rating for this attribute. 


4. CONCLUSIONS AND FURTHER WORK 

In this paper, we have presented a computer system developed 
to classify musical excerpts according to five verbal descriptor 
of timbral qualities. The prototype classification system is de- 
veloped in Mat lab with five timbral attributes currently im- 
plemented: breathiness, brightness, dullness, roughness and 
warmth. To evaluate the performances of the system, we ran 
a preliminary perceptual study with 20 individuals, using or- 
chestral audio samples generated by Orchids. However, the 
same method could be applied with any source sound set (e.g. 
synthesised sounds). 

Listener’s ratings were in good agreement with the sys- 
tem’s classification for the attributes breathiness, brightness 
and dullness. For the attributes roughness and warmth, the 
results showed a high amount of inter-participant variation 
in the listeners responses. This could be addressed by in- 
creasing the number of participants but at the pilot stage of 
this research, the high number of participants which would 
be required (likely hundreds) is not practical. Therefore we 
propose that this variation be addressed in further work by 
improving the classification system in response to participant 
ratings. 

Nevertheless, these early results are promising with re- 
spect to the success of four of the five selected attributes, par- 
ticularly given the absence of fully agreed or realised metrics. 
We think such a system facilitating the use of a particular au- 
dio descriptor to estimate attributes could be useful in musical 
composition as well as in intelligent music production tasks. 
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